A Memory Subsystem Consisting of an on - Board Crossbar , Level - 2 Cache , and Memory Controllers for a Highly
نویسندگان
چکیده
0272-1732/05/$20.00 ”2005 IEEE Published by the IEEE computer Society Over the past two decades, microprocessor designers have focused on improving the performance of a single thread in a desktop processing environment by increasing frequencies and exploiting instruction level parallelism (ILP) using techniques such as multiple instruction issue, out-of-order issue, and aggressive branch prediction. The emphasis on single-thread performance has shown diminishing returns because of the limitations in terms of latency to main memory and the inherently low ILP of applications. This has led to an explosion in microprocessor design complexity and made power dissipation a major concern. For these reasons, Sun Microsystems’ Niagara processor takes a radically different approach to microprocessor design. Instead of focusing on the performance of single or dual threads, Sun optimized Niagara for multithreaded performance in a commercial server environment. This approach increases application performance by improving throughput, the total amount of work done across multiple threads of execution. This is especially effective in commercial server applications such as databases and Web services, which tend to have workloads with large amounts of thread level parallelism (TLP). In this article, we present the Niagara processor’s architecture. This is an entirely new implementation of the Sparc V9 architectural specification, which exploits large amounts of on-chip parallelism to provide high throughput. Niagara supports 32 hardware threads by combining ideas from chip multiprocessors and fine-grained multithreading. Other studies have also indicated the significant performance gains possible using this approach on multithreaded workloads. The parallel execution of many threads effectively hides memory latency. However, having 32 threads places a heavy demand on the memory system to support high bandPoonacha Kongetira Kathirgamar Aingaran Kunle Olukotun
منابع مشابه
A Study of Multithreaded Benchmarks on the Hewlett-Packard X- and V-Class Architectures
The Hewlett-Packard Xand V-Class ccNUMA systems appear well suited to exploiting coarse and ne-grained parallelism, using multithreading techniques. This paper brie y summarizes the multilevel memory subsystem for the Xand V-Class platforms. Typical MPP distributed memory programming concerns for the codes under investigation, such as explicit memory localization and load balancing, are compare...
متن کاملPrototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability
We present the hardware design and implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). Our memory system supports both implicit communication via caches, and explicit communication via directly accessible local (”scratchpad”) memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks that lie near each processor...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملDRAM Caching
This paper presents methods to reduce memory latency in the main memory subsystem below the board-level cache. We consider conventional page-mode DRAMs and cached DRAMs. Evaluation is performed via trace-driven simulation of a suite of nine benchmarks. In the case of page-mode DRAMs we show that it can be detrimental to use page-mode naively. We propose two enhancements that reduce overall memo...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005